29 research outputs found

    Executable Pseudocode for Graph Algorithms

    Get PDF
    Algorithms are written in pseudocode. However the implementation of an algorithm in a conventional, imperative programming language can often be scattered over hundreds of lines of code thus obscuring its essence. This can lead to difficulties in understanding or verifying the code. Adapting or varying the original algorithm can be laborious. We present a case study showing the use of Common Lisp macros to provide an embedded, domain-specific language for graph algorithms. This allows these algorithms to be presented in Lisp in a form directly comparable to their pseudocode, allowing rapid prototyping at the algorithm level. As a proof of concept, we implement Brandes' algorithm for computing the betweenness centrality of a graph and see how our implementation compares favourably with state-of-the-art implementations in imperative programming languages, not only in terms of clarity and verisimilitude to the pseudocode, but also execution speed

    Discovering Motifs in Real-World Social Networks

    Get PDF
    We built a framework for analyzing the contents of large social networks, based on the approximate counting technique developed by Gonen and Shavitt. Our toolbox was used on data from a large forum---\texttt{boards.ie}---the most prominent community website in Ireland. For the purpose of this experiment, we were granted access to 10 years of forum data. This is the first time the approximate counting technique is tested on real-world, social network data

    A multiphysics and multiscale software environment for modeling astrophysical systems

    Get PDF
    We present MUSE, a software framework for combining existing computational tools for different astrophysical domains into a single multiphysics, multiscale application. MUSE facilitates the coupling of existing codes written in different languages by providing inter-language tools and by specifying an interface between each module and the framework that represents a balance between generality and computational efficiency. This approach allows scientists to use combinations of codes to solve highly-coupled problems without the need to write new codes for other domains or significantly alter their existing codes. MUSE currently incorporates the domains of stellar dynamics, stellar evolution and stellar hydrodynamics for studying generalized stellar systems. We have now reached a "Noah's Ark" milestone, with (at least) two available numerical solvers for each domain. MUSE can treat multi-scale and multi-physics systems in which the time- and size-scales are well separated, like simulating the evolution of planetary systems, small stellar associations, dense stellar clusters, galaxies and galactic nuclei. In this paper we describe three examples calculated using MUSE: the merger of two galaxies, the merger of two evolving stars, and a hybrid N-body simulation. In addition, we demonstrate an implementation of MUSE on a distributed computer which may also include special-purpose hardware, such as GRAPEs or GPUs, to accelerate computations. The current MUSE code base is publicly available as open source at http://muse.liComment: 24 pages, To appear in New Astronomy Source code available at http://muse.l

    Extracting causal relations on HIV drug resistance from literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature.</p> <p>Results</p> <p>In this work we present a novel method to extract and combine relationships between HIV drugs and mutations in viral genomes. Our extraction method is based on natural language processing (NLP) which produces grammatical relations and applies a set of rules to these relations. We applied our method to a relevant set of PubMed abstracts and obtained 2,434 extracted relations with an estimated performance of 84% for F-score. We then combined the extracted relations using logistic regression to generate resistance values for each <drug, mutation> pair. The results of this relation combination show more than 85% agreement with the Stanford HIVDB for the ten most frequently occurring mutations. The system is used in 5 hospitals from the Virolab project <url>http://www.virolab.org</url> to preselect the most relevant novel resistance data from literature and present those to virologists and medical doctors for further evaluation.</p> <p>Conclusions</p> <p>The proposed relation extraction and combination method has a good performance on extracting HIV drug resistance data. It can be used in large-scale relation extraction experiments. The developed methods can also be applied to extract other type of relations such as gene-protein, gene-disease, and disease-mutation.</p

    Comparison of HIV-1 Genotypic Resistance Test Interpretation Systems in Predicting Virological Outcomes Over Time

    Get PDF
    Background: Several decision support systems have been developed to interpret HIV-1 drug resistance genotyping results. This study compares the ability of the most commonly used systems (ANRS, Rega, and Stanford's HIVdb) to predict virological outcome at 12, 24, and 48 weeks. Methodology/Principal Findings: Included were 3763 treatment-change episodes (TCEs) for which a HIV-1 genotype was available at the time of changing treatment with at least one follow-up viral load measurement. Genotypic susceptibility scores for the active regimens were calculated using scores defined by each interpretation system. Using logistic regression, we determined the association between the genotypic susceptibility score and proportion of TCEs having an undetectable viral load (<50 copies/ml) at 12 (8-16) weeks (2152 TCEs), 24 (16-32) weeks (2570 TCEs), and 48 (44-52) weeks (1083 TCEs). The Area under the ROC curve was calculated using a 10-fold cross-validation to compare the different interpretation systems regarding the sensitivity and specificity for predicting undetectable viral load. The mean genotypic susceptibility score of the systems was slightly smaller for HIVdb, with 1.92±1.17, compared to Rega and ANRS, with 2.22±1.09 and 2.23±1.05, respectively. However, similar odds ratio's were found for the association between each-unit increase in genotypic susceptibility score and undetectable viral load at week 12; 1.6 [95% confidence interval 1.5-1.7] for HIVdb, 1.7 [1.5-1.8] for ANRS, and 1.7 [1.9-1.6] for Rega. Odds ratio's increased over time, but remained comparable (odds ratio's ranging between 1.9-2.1 at 24 weeks and 1.9-2.

    A niche width model of optimal specialization

    Get PDF
    Niche width theory, a part of organizational ecology, predicts whether “specialist” or “generalist” forms of organizations have higher “fitness,” in a continually changing environment. To this end, niche width theory uses a mathematical model borrowed from biology. In this paper, we first loosen the specialist-generalist dichotomy, so that we can predict the optimal degree of specialization. Second, we generalize the model to a larger class of environmental conditions, on the basis of the model’s underlying assumptions. Third, we criticize the way the biological model is treated in sociological theory. Two of the model’s dimensions seem to be confused, i.e., that of trait and environment; the predicted optimal specialization is a property of individual organizations, not of populations; and, the distinction between “fine” and “coarse grained” environments is superfluous

    Discovering motifs in real-world social networks

    No full text
    We built a framework for analyzing the contents of large social networks, based on the approximate counting technique developed by Gonen and Shavitt. Our toolbox was used on data from a large forum—boards.ie—the most prominent community website in Ireland. For the purpose of this experiment, we were granted access to 10 years of forum data. This is the first time the approximate counting technique is tested on real-world, social network data

    Steven de Rooij — Methods of Statistical Data Compression

    No full text
    Data compression is important not only for conserving resources; it also has applications in cryptography and it can be used as an estimator for redundancy in the data: this has many applications, such as prediction, classification and other difficult problems in machine learning. We study algorithms that perform lossless statistical data compression. Statistical data compression is attractive because it allows for separation of the problems of modelling and coding, both of which will be treated here. It seems safe to say that with the development of arithmetic coding in 1976, the problem of coding has been solved satisfactorily, while the problem of modelling remains very difficult to this day. We will restrict ourselves to online modelling. In chapter 2 we study the theoretical background of statistical data compression, relating results of information theory and probability theory to coding and modelling. Then we focus on more concrete issues: in chapter 3 we treat an adaptation of Ukkonen’s algorithm for the online construction of suffix trees, whic
    corecore